The Knowledge-Gradient Policy for Correlated Normal Beliefs
نویسندگان
چکیده
We consider a Bayesian ranking and selection problem with independent normal rewards and a correlated multivariate normal belief on the mean values of these rewards. Because this formulation of the ranking and selection problem models dependence between alternatives’ mean values, algorithms may utilize this dependence to perform efficiently even when the number of alternatives is very large. We propose a fully sequential sampling policy called the knowledge-gradient policy, which is provably optimal in some special cases and has bounded suboptimality in all others. We then demonstrate how this policy may be applied to efficiently maximize a continuous function on a continuous domain while constrained to a fixed number of noisy measurements.
منابع مشابه
Optimal learning for sequential sampling with non-parametric beliefs
We propose a sequential learning policy for ranking and selection problems, where we use a non-parametric procedure for estimating the value of a policy. Our estimation approach aggregates over a set of kernel functions in order to achieve a more consistent estimator. Each element in the kernel estimation set uses a di erent bandwidth to achieve better aggregation. The nal estimate uses a weigh...
متن کاملThe knowledge gradient algorithm for online learning
We derive a one-period look-ahead policy for finiteand infinite-horizon online optimal learning problems with Gaussian rewards. The resulting decision rule easily extends to a variety of settings, including the case where our prior beliefs about the rewards are correlated. Experiments show that the KG policy performs competitively against other learning policies in diverse situations. In the ca...
متن کاملThe Knowledge Gradient Algorithm for a General Class of Online Learning Problems
We derive a one-period look-ahead policy for finiteand infinite-horizon online optimal learning problems with Gaussian rewards. Our approach is able to handle the case where our prior beliefs about the rewards are correlated, which is not handled by traditional multi-armed bandit methods. Experiments show that our KG policy performs competitively against the best known approximation to the opti...
متن کاملOnline Supplement to “The Knowledge-Gradient Policy for Correlated Normal Beliefs”
As discussed in Section 3 of the main paper, the KG policy posseses several optimality and convergence properties. First, it is optimal by construction when N = 1 (Remark 1). Second, the suboptimality gap between the values of the KG and the optimal policies narrows to 0 as N →∞ (Theorem 4). This is a convergence result, since it shows that when sampling under the KG policy we are guaranteed to...
متن کاملعوامل مؤثر بر تولید اسناد سیاستی مبتنی بر شواهد در ستاد وزارت بهداشت، درمان و آموزش پزشکی
Introduction: Successful reduction in the gap between applied knowledge and pure knowledge, depends on the identification of factors affecting it .The objective of the study was to identify the barriers and facilitators to the development of evidence-based papers from the perspective of their producers at the Ministry of Health Care and Medical Education headquarter office. Methods: Qualitativ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- INFORMS Journal on Computing
دوره 21 شماره
صفحات -
تاریخ انتشار 2009